Prisoner's Dilemma is a pretty standard game that is commonly used in game theory (for a comprehensive defintion of Prisoner's dilemma see this link). Iterated Prisoner's Dilemma is simply repeating the game and allowing the agents to choose again. Prisoner's Dilemma is implemented in a module I wrote called game_types. For an example using the PrisonersDilemma class see the pseudocode snippet below (I dicuss different types of strategies availiable in subsequent cells):
from game_types import PrisonersDilemma
...
...
...
agent1 = Strategy() # Some Strategy for agent1
agent2 = Strategy() # Some Strategy for agent2
game = PrisonersDilemma(agent1, agent2) # Play Prisoners Dilemma with agent1 and agent2
num_iter = 10000 # Play the game 10000 times
game.play(num_iter)
data = game.data # grab data from game
In this scenario Robert Brunner uploads a dataset and asks two graduate students Edward and Will to download and perform a variety of tasks. For the sake of simplicity in this game Edward and Will are the only ones on a special network and they recieve no interference from other people.
The traffic for this network is governed by the TCP Protocol and one feature of TCP is the backoff mechanism. If the rates that Edward and Will are sending packets to this network causes congestion they (backoff
and) each reduce the rate for a while until network congestion subsides. This is the correct implementation. A defective implementation of TCP would be one that does not backoff if the network is congested. The users in this game have 2 choices. To Cooperate (use the correct implementation of the TCP protocol) or to defect (use an incorrect implementation of the TCP protocol).
The payoff matrix is below. The numbers in the box are the utility values. For this example the higher the utlity value the faster the dataset is downloaded. Will is the first player and is the first number in each box. Edward is the second player and is the the second number in each box.
If Edward and Will follow the correct protocol they will both download the dataset in a reasonable amout of time (top left). If Will decides he wants the data faster and uses a defective TCP protocol while Edward still follows the correct protocol Will downloads the dataset much faster than Edward (Top Right). Vise-versa (bottom left). If they both defect they download the dataset significantly slower than if they both cooperated.
The strategies of Will and Edward will depend on what they want to achieve. Edward's part may be based on Will's work so it would make sense for Will to defect and Edward cooperates on purpose. If it's a competition they may try to defect to get a head start. If both split up the work then it may be in there interest to Cooperate. There are a lot of scenarios that can unfold but in most games agents do not know the other agents intentions or strategies so for the sake of this game we assume Edward and Will know nothing about it.
I've implemented the following basic strategies in a directory called strategies:
Cooperate: With this strategy Will or Edward (the agent) always cooperates. To create an agent that uses this strategy you can do the following in Python:
from strategies import cooperate as c
agent = c.Cooperate()
Defect: With this strategy the agent always defects. To create an agent that uses this strategy you can do the following in Python:
from strategies import defect as d
agent = d.Defect()
Chaos: An agent uses this strategu to cause chaos by cooperating or defecting at random. To create an agent that uses this strategy you can do the following in Python:
from strategies import chaos as ch
agent = ch.Chaos()
Grim: This strategy is the most unforgiving. The agent cooperates until the opponent defects, in which case it will defect for the remainder of the game. This is also known as a trigger strategy. To create an agent that uses this strategy you can do the following in Python:
from strategies import grim as g
agent = g.Grim()
Pavlov: This is another trigger strategy where initally the agent will cooperate until it loses then it will change it's strategy (defect). The agent will continue to change it's strategy if it loses. To create an agent that uses this strategy you can do the following in Python:
from strategies import pavlov as p
agent = p.Pavlov()
Q-Learning: This agent uses Q-learning for it's strategy where Q Learning is a model-free reinforcement learning technique. To create an agent that uses this strategy you can do the following in Python:
from strategies import machine_learning as ml
agent = ml.QLearn()
Human: This agent recieves the action as input from a human player. To create an agent that uses this strategy you can do the following in Python:
from strategies import human as h
agent = h.Human()
In [1]:
%matplotlib inline
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
# Importing Games
from game_types import PrisonersDilemma
from game_types import Coordination
# Importing Strategies
from strategies import chaos as c
from strategies import defect as d
from strategies import machine_learning as ml
from strategies import pavlov as p
from strategies import grim as g
In [2]:
# Create agents and play the game for 10000 iteratations
agent1 = c.Chaos()
agent2 = d.Defect()
game = PrisonersDilemma(agent1, agent2)
game.play(10000)
# Grab Data
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('Chaos Agent Vs Defect Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add([1,2,5],width-.05))
_ = ax.set_xticklabels(('1', '2', '5'))
_ = ax.legend((a1[0], a2[0]), ('Chaos Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Defect Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
In this scenario defecting is the domiant strategy. Where the agent is better off defecting no matter what other agents do.
In [3]:
# play the game
agent1 = g.Grim()
agent2 = p.Pavlov()
game = PrisonersDilemma(agent1, agent2)
game.play(10000)
# get data from game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('Grim Agent Vs Pavlov Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add([0,4,5],width/2))
_ = ax.set_xticklabels(('0', '4', '5'))
_ = ax.legend((a1[0], a2[0]), ('Grim Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Pavlov Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Both strategies start out cooperating, Grim never defects because pavlov never defects. Pavlov never loses a round so it doesn't change it's strategy.
In [4]:
# Play the Game
agent1 = ml.QLearn()
agent2 = p.Pavlov()
game = PrisonersDilemma(agent1, agent2)
game.play(10000)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs Pavlov Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add([1,2,4,5],width/2))
_ = ax.set_xticklabels(('1', '2', '4', '5'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Pavlov Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Pavlov's simple rules out performs Q Learning here which is interesting.
In [5]:
print(agent1_util_vals, agent2_util_vals)
In [6]:
# Play the Game
N = 10000
agent1 = ml.QLearn()
agent2 = c.Chaos()
game = PrisonersDilemma(agent1, agent2)
game.play(N)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs Chaos Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add(x2,width/2))
_ = ax.set_xticklabels(('1', '2', '4', '5'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Chaos Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Q Learning significantly outperforms the Chaos Agent because the Q Learning Agent learns pretty quickly that defecting yields the highest expected utility (talked about more in appendix).
In [7]:
# Play the Game
N = 10000
agent1 = ml.QLearn()
agent2 = ml.QLearn()
game = PrisonersDilemma(agent1, agent2)
game.play(N)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs QLearning Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add(x2,width/2))
_ = ax.set_xticklabels(('1', '2', '4', '5'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'QLearning Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Here both QLearning Agents tend to mirror each other. I assume this is because they have the same inital parameters which will yield the same expected utility.
In [8]:
# Play the Game
N = 200000 # Play a longer game
# agent 1's parameters are bit more short sighted
agent1 = ml.QLearn(decay=0.4, lr=0.03, explore_period=30000, explore_random_prob=0.4, exploit_random_prob=0.2)
# agent 2's parameters think more about the future
agent2 = ml.QLearn(decay=0.6, lr=0.2, explore_period=40000, explore_random_prob=0.4, exploit_random_prob=0.1)
game = PrisonersDilemma(agent1, agent2)
game.play(N)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs QLearning Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add(x2,width/2))
_ = ax.set_xticklabels(('1', '2', '4', '5'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'QLearning Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
(I haven't had the time to look through the actions of both agents but one is short sighted and the other is not, which yields the Orange QLearning agent a higher total utility score.)
In [9]:
print(agent1_util_vals, agent2_util_vals)
In this scenario Vincent and Maghav want to see different movies. Vincent wants to see Guardians of the Galaxy 2 and Maghav wants to see Wonder Woman. They are willing to go see the movie that don't really care for but they both don't want to go see a movie alone. They both have 2 choices to defect (see the other persons movie person), or to cooperate go and see the movie they want. The payoff matrix is below:
In [10]:
# Create agents and play the game for 10000 iteratations
agent1 = c.Chaos()
agent2 = d.Defect()
game = Coordination(agent1, agent2)
game.play(10000)
# Grab Data
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('Chaos Agent Vs Defect Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add([0,1, 2],width-.05))
_ = ax.set_xticklabels(('0','1','2'))
_ = ax.legend((a1[0], a2[0]), ('Chaos Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Defect Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Here Defect isn't a domiant strategy. The defect agent only recieves a non 0 utility value if the chaos agent sees the movie they intended to see. A Mixed Strategy is needed.
In [11]:
print(agent1_util_vals,agent2_util_vals)
In [23]:
# play the game
agent1 = g.Grim()
agent2 = p.Pavlov()
game = Coordination(agent1, agent2)
game.play(10000)
# get data from game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('Grim Agent Vs Pavlov Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add([0,1,2],width/2))
_ = ax.set_xticklabels(('0', '1', '2'))
_ = ax.legend((a1[0], a2[0]), ('Grim Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Pavlov Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Grim loses in the first round and always goes to other movie, the Pavlov Agent even won a round where they both ended up at the same movie and never changed it's strategy.
In [21]:
print(agent1_util_vals, agent2_util_vals)
In [22]:
# Play the Game
N = 10000
agent1 = ml.QLearn()
agent2 = c.Chaos()
game = Coordination(agent1, agent2)
game.play(N)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs Chaos Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add(x2,width/2))
_ = ax.set_xticklabels(('0', '1', '2'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'Chaos Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
This is different from Prisoner's Dilema, the QLearning Agent is trying to cooperate with the chaos agent but can never predict which movie he is going to.
In [19]:
# Play the Game
N = 10000
agent1 = ml.QLearn()
agent2 = ml.QLearn()
game = Coordination(agent1, agent2)
game.play(N)
# Get Data from Game
agent1_util_vals = Counter(game.data['A'])
agent2_util_vals = Counter(game.data['B'])
a1_total_score = sum(game.data['A'])
a2_total_score = sum(game.data['B'])
# Plot the results
x1, y1, x2, y2 = [], [], [], []
for i, j in zip(agent1_util_vals, agent2_util_vals):
x1.append(i)
y1.append(agent1_util_vals[i])
x2.append(j)
y2.append(agent2_util_vals[j])
fig, ax = plt.subplots(figsize=(12,6))
width = 0.35
a1 = ax.bar(x1, y1, width, color='#8A9CEF')
a2 = ax.bar(np.asarray(x2)+width, y2, width, color='orange')
_ = ax.set_title('QLearning Agent Vs QLearning Agent')
_ = ax.set_ylabel('Number of Games')
_ = ax.set_xlabel('Utility Values')
ax.set_xticks(np.add(x2,width/2))
_ = ax.set_xticklabels(('0','1', '2'))
_ = ax.legend((a1[0], a2[0]), ('QLearning Agent\nTotal Utility Score: {}'.format(str(a1_total_score)),
'QLearning Agent\nTotal Utility Score: {}'.format(str(a2_total_score))), loc=1, bbox_to_anchor=(1.35, 1))
plt.show()
Still playing around with this one, but both do pretty bad here.
In [16]:
print(agent1_util_vals, agent2_util_vals)
In [ ]: